Evolving Perl code for protein secondary structure prediction
نویسنده
چکیده
Progress in the area of secondary structure prediction has been frustratingly slow[6]. The most accurate predictors at the moment are trained to predict one of three secondary structural states (helix, strand or coil) for each residue at position i using sequence information from a “window” of residues i ± 7. Information from more distant sequence positions should improve predictions further, since it is assumed that non-local interactions, like those that occur in sheet formation, can modulate the innate secondary structure preferences of a residue and its near neighbours. However, simply using a larger window does not help. First, the information content decreases rapidly as one moves away from i; because the likelihood that these residues are close in 3D space is also diminishing. Second, there is the problem of a using a fixed window to capture information from variable length secondary structures. Recently, attempts have been made to incorporate non-local information in secondary structure predictions. Baldi and coworkers[5] have used recurrent multi-pass neural networks and have shown that information from residues i±15 influences their predictions. Bystroff and coworkers[2] have taken another approach, which is to combine local predictors for secondary and supersecondary structures into a single large hidden Markov model which simultaneously takes into account context effects throughout the sequence. The more successful ab initio 3D predictors, such as Rosetta[1], may also have the side effect of producing more accurate secondary structure predictions. Unfortunately, these three approaches have not yet been shown to be superior to the established predictors in terms of percent predicted correctly into helix, strand or coil (Q3). The best Q3 currently stands at around 76%[4], and any future improvement on this will be a strong indicator of the successful incorporation of long-range and/or folding information into the predictors. This paper describes another attempt to increase secondary structure prediction accuracy using long-range information. The main assumption made in this work is that some form of computer program exists, at least in theory, which can do this. Such a program might mimic the folding dynamics in some way, perhaps in one, two or three dimensions using a reduced complexity model. For example, a predictor could have a simple rule: “predict weak-strand-region as strand if number-of-already-assigned-strands ≥ 2”. This rule could, for example, be applied after assigning “strong-strand-regions”, but before assigning regions with helical sequence patterns. The rationale here is that strands might be more likely to form in the context of an already forming sheet.
منابع مشابه
Prediction of Secondary Structure of Citrus Viroids Reported from Southern Iran
Abstract Viroids are smallest, single-stranded, circular, highly structured plant pathogenic RNAs that do not code for any protein. Viroids belong to two families, the Avsunviroidae and the Pospiviroidae. Members of the Pospiviroidae family adopt a rod-like secondary structure. In this study the most stable secondary structures of citrus viroid variants that reported from Fars province wer...
متن کاملProtein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملAn Algorithmic Framework for the Study of Behavior of siRNA Sequences
The study about biological sequences is gaining momentum nowadays. An increasing number of researchers have proposed framework for the implementation of various algorithms for biomolecules sequence alignment and secondary structure prediction. A comparative study can also enhance the results but alignment and prediction algorithms vary widely in terms of both sensitivity and selectivity across ...
متن کاملPrediction of Protein Secondary Structure Using Genetic Programming
Certificate This is to certify that, Varun Aggarwal, (104/ECE/2000) a student of NSIT, Delhi, India did his summer training under me at Stockholm Bioinformatics Center for the months of June-July 2003. He worked on two projects documented in this report. Acknowledgement I will like to thanks Dr. Bob MacCallum for giving me this opportunity to work with his group. I hugely benefited and wish to ...
متن کاملPhysicochemical Position-Dependent Properties in the Protein Secondary Structures
Background: Establishing theories for designing arbitrary protein structures is complicated and depends on understanding the principles for protein folding, which is affected by applied features. Computer algorithms can reach high precision and stability in computationally designing enzymes and binders by applying informative features obtained from natural structures. Methods: In this study, a ...
متن کامل